Automatic Journal Descriptor Indexing Extended to Indexing by UMLS Semantic Type Using the Cosine Document Similarity Measure

نویسندگان

  • Susanne M. Humphrey
  • Thomas C. Rindflesch
  • Alan R. Aronson
چکیده

As background, this paper describes journal descriptor (JD) indexing, based on indexing at the journal level using only 127 descriptors, and applying statistical methods that associate this journal indexing with text words in a training set of MEDLINE® citations. These associations then form the basis for automatic indexing of documents outside the training set. The paper then presents the new technique of semantic type (ST) indexing, based on JD indexing associated with each of 134 ST’s, and applying the standard cosine coefficient measure to compare the similarity between the JD indexing of a document and the JD indexing of each ST. The ST indexing of the document is the list of ST’s ranked in decreasing order of similarity between the JD indexing of the document and the JD indexing of the ST’s. Discussion of the potential usefulness and application of the very general indexing provided by ST’s comprises the remainder of the paper. In particular, it is suggested, with several examples, that ST’s may convey a unique slant of a document’s content not normally represented in standard indexing vocabularies. Use of ST indexing to rank retrieved output is mentioned as a possible application. Notwithstanding the importance of methodology and performance issues, the intent of this paper is to explore questions of the potential utility and applicability of ST indexing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Indexing by Discipline and High-Level Categories: Methodology and Potential Applications

This paper first describes the methodology of journal descriptor (JD) indexing, based on human indexing at the journal level using only 127 descriptors, and applying statistical methods that associate this journal indexing with text words in a training set of MEDLINE® citations. These associations form the basis for automatic indexing of documents outside the training set. The paper then presen...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

A Similarity - based Probability Model for Latent Semantic IndexingChris

A dual probability model is constructed for the Latent Semantic Indexing (LSI) using the cosine similarity measure. Both the document-document similarity matrix and the term-term similarity matrix naturally arise from the maximum likelihood estimation of the model parameters, and the optimal solutions are the latent semantic vectors of of LSI. Dimensionality reduction is justiied by the statist...

متن کامل

The phrase - based vector space model for automatic retrievalof free - text medical documents q Wenlei Mao , Wesley W . Chu

Objective: To develop a document indexing scheme that improves the retrieval effectiveness of free-text medical documents. Design: The phrase-based vector space model (VSM) uses multi-word phrases as indexing terms. Each phrase consists of a concept in the unified medical language system (UMLS) and its corresponding component word stems. The similarity between concepts are defined by their rela...

متن کامل

Bilingual Document Alignment with Latent Semantic Indexing

We apply cross-lingual Latent Semantic Indexing to the Bilingual Document Alignment Task at WMT16. Reduced-rank singular value decomposition of a bilingual term-document matrix derived from known English/French page pairs in the training data allows us to map monolingual documents into a joint semantic space. Two variants of cosine similarity between the vectors that place each document into th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992